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Abstract. This article investigates deterministic design matrices X for the fun- 
damental problems of error prediction and variable selection given observations 
y = Xfi* + 2 where z is a stochastic error term. In this paper, deterministic de- 
sign matrices are derived from unbalanced expander graphs, and we show that it is 
possible to accurately estimate the prediction X/3* and the target vector /J* using 
computationally tractable algorithms. 

Using a result of Berinde et at (see [BGI+08]), we show that for any adjacency 
matrix of an unbalanced expander graph and any target vector the lasso {£\- 
penalized least squares) and the Dantzig selector (/'co-penalized basis pursuit) 
satisfy oracle inequalities in error prediction and variable selection involving the 
s largest (in magnitude) coefficients of i.e. upper bounds in term of the best 
sparse approximation. 

Using recent results on Parvaresh-Vardy codes, we present a construction of 
deterministic designs. Furthermore, we prove that these designs are almost op- 
timal. Indeed, they provide error prediction and variable selection with an ac- 
curacy which is the best, up to an explicit factor, one could expect knowing the 
support of the target fi*. 



1. Introduction 

This article focuses on the problem of processing high-dimensional data. Our 
framework is broadly the compressed sensing where one seeks to acquire the main 
information of a signal directly from a minimum of measurements. The field of 
applications is wide and encompasses compressive imaging, MRI (magnetic reso- 
nance imaging), NMR (Nuclear Magnetic resonance) spectroscopy radar design, 
real-number error correction, communications and high-speed analog-to-digital 
conversions [Can06]. Beyond the wide spectrum of applications, a fundamental 
question is to find efficient design matrices for common estimators. Unlike the 
traditional approach that looks for random matrices, we aim at giving determin- 
istic design matrices. 

Our present work is based on unbalanced expander graphs [BI08, JXHC09] that 
give outstanding explicit design matrices. More precisely we present a determin- 
istic construction of design based on Parvaresh-Vardy codes and the recent work 
of Guruswami et al. [GUV09]. This construction can be found in Section 3.3. As 
a matter of fact, we show the optimality of oracle inequalities in the this frame- 
work (see Section 2.2). Furthermore, our oracle inequalities are derived from two 
efficiently verifiable conditions (satisfied by unbalanced expander graphs). 
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1.1. The deterministic design matrix. It emerged recently that compressed sens- 
ing and coding theory share similar properties. In 2007, B. Hassibi and W. Xu 
[HX07] gave a generalization of expander codes [SS96] (which are linear error- 
correcting codes derived from expander graphs) to compressed sensing. Fur- 
thermore, Berinde et al. [BGI + 08] pointed out that unbalanced expander graphs 
can recover efficiently the sparse vectors. In their fundamental article [CRT06], 
E. Candes, J. Romberg, and T. Tao showed that their RIP2 property is a suffi- 
cient condition that guarantees efficient signal reconstruction. It can be stated as 
follows. 

Definition 1 (RIP2) — A matrix X £ ]R" X P satisfies the RIP2 property of order s 
with constant < 5 < 1 if and only if 

V7 G R p such that |supp(7)| < s : (1 -£)||7|| 2 < ||X7|| 2 < (1 +<5)||t|| 2 , 

where |supp(7)| denotes the size of the support {i.e. the set of the indices of the nonzero 
coefficients) of the vector 7. 

Intuitively, it states that the design matrix preserves the i?2~ norm of s-sparse vec- 
tors (i.e. it is an almost isometry on the space of sparse vectors). This property 
implies that exactly recovery using l\ -minimization (i.e. basis pursuit) is possible. 

In 2008, Berinde et al. [BGI + 08] showed that the adjacency matrix of an ex- 
pander graph satisfies a very similar property, called the restricted isometry prop- 
erty in the £i-norm (the so-called RIPi, see Section 3). They used this property 
to show that exact recovery by basis pursuit (with unbalanced expander graph 
designs) is still possible. They proved a useful uncertainty principle connecting the 
mass on a small subsets, namely ||7s||i' to the whole mass ||7|L- 

Lemma 1.1 ([BGI+08], Uncertainty Principle) — Let XeR'^k the renormal- 
ized adjacency matrix of an (2s, e) -unbalanced expander with e < 1/4. Then X satisfies 
the following uncertainty principle: 

V-yeW, VSC{l,...,p}s.t |S| <s, (l-^j^sllj < ||X7|| 1 +2e||7 S c|| 17 

where 75 denotes the vector of which i-th entry is equal to 7,- ifi&S and otherwise. In 
particular for e < 1 /8, it satisfies the Uncertainty Principle condition (1). 

The expander graphs are presented in Section 3 (the definition of an "(2s, e)- 
unbalanced expander with e < 1/8" is postponed to this section). 

1.2. Our assumptions on the design. Our statistical analysis uses two aspects of 
the renormalized adjacency matrix of expander graph (see Section 3). Following 
the previous lemma, assume that: 

Uncertainty Principle condition (of order s): The design matrix 
X £ ]R nx P satisfies the inequality: 

(1) V7 e IR p , VS C {l,...,p}s.t. |S| < s, it holds j^s^ < 2||X7|| 1 + i||7se|| 1 . 

Moreover, we concern with renormalized adjacency matrix. Namely: 

^-normalization condition: All the columns of the design matrix 
X e R nx P have t\-norm equal to 1. 
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Otherwise specified, we assume that the design matrix X 6 M" X P satisfies these two 
conditions. Let us emphasize that the renormalized adjacency matrix of an (2s, e)- 
unbalanced expander graph (with £ < 1/8) satisfies them. As a matter of fact, this 
framework is relevant in terms of unbalanced expander graphs. 

1.2.1. The Uncertainty Principle condition and the mdlspace property. An important 
property exploited in this article is the uncertainty principle presented in (1). In 
particular, it yields 

VSC {l,...,p}, |S| <s, V 7 eker(X): < ^\\7 S '\\ V 

where ker(X) denotes the kernel of the matrix X £ ]R" X P. This last inequality 
means that the vectors of the kernel can not be concentrated on small subsets. In 
particular, it implies the nullspace property [CDD09] of order s, namely: 

(2) V 7 e ker(X) \ {0}, VS C {1,. . .,p}, \S\ < s : Uts^ < 

It is now standard that the nullspace property (2) is a necessary and sufficient 
condition to the following proposition: 

The basis pursuit estimator 

B b P £ argrnin lljSlL such that XB = XB* , 

exactly recovers all the target vector B* 6 W of which size of support 
not greater than s. 

Thus the Uncertainty Principle condition (1) is a sufficient condition for basis pur- 
suit. 

1.2.2. The Uncertainty Principle condition and the Restricted Eigenvalue assumption. 
In the same way, condition (1) implies the Restricted Eigenvalue assumption of 
P.J. Bickel, Y. Ritov, and A.B. Tsybakov [BRT09]. Indeed, this assumption consid- 
ers the smallest eigenvalue (in absolute value) on a cone of restriction. 

Definition 2 (restricted eigenvalue RE(s, cq)) — A design matrix X 6 R" x f 
satisfies the restricted eigenvalue assumption with the parameters s and cq if and only if 

IIX7II2 

k(s, cq) = min min —7^ — —— > . 

sc{i,..., P } 7^0 vnllTsIb 

s|<s ||7sc|li< c oll7slli 

The constant k(s, Cq) is called the (s, CQj-restricted li-eigenvalue. 

It is not difficult to check that the Uncertainty Principle condition of order s (see 
(1)) implies that: 

Vc <2, VSC {1 p} s.t. |S| <s, V7^0s.t. ||7 SC ||i<c ||7s||i, X 7 £ . 

A compactness argument gives that k(s,Cq) > for all < Cq < 2. As a conse- 
quence, the Uncertainty Principle condition (1) of order s implies the Restricted 
Eigenvalue assumption with the parameters s and cq, for all < eg < 2. How- 
ever, we cannot estimate the (s,c$)-restricted i^-eigenvalue of X for any cq > 0. The 
Restricted Eigenvalue approach fails in giving oracle inequalities in this case. 
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1.2.3. The Uncertainty Principle condition and the H s ^(1/3) condition. In parallel to 
our work, A. Juditsky and A. Nemirovski [JN10] gave an outstanding efficiently 
verifiable condition of performance of the lasso and the Dantzig selector. Al- 
though the matrices constructed from the expander graphs are not specifically 
studied in [JN10], they study uncertainty conditions similar to the ones stated 
in equation (1). An attentive reading of their article shows that their H S/ i(l/3) 
is related to Uncertainty Principle condition (1). Indeed, Uncertainty Principle 
condition can be equivalently stated as 

||7s||i<|||X7||i+^||7||i/ V 7 eR p ,VSC {l,...,p}s.t. \S\<s. 

This leads to H s j(l/3) condition (see 5.3 in [JN10]) for the lasso and the Dantzig 
selector, namely: 

||7s||i<As||X7|| 2 +i||7l|l/ V 7 eK p ,VSC {1 p} s.t. \S\<s, 

where 

Their result (Proposition 9, [JN10]) is similar to (11) and (19) in terms of regular 
consistency (see the discussion in Section 2.2.4). However, let us emphasize that 
the results in [JN10] concern only the regular consistency for ^-recovery In 
particular, there is no result in error prediction. 

1.3. The lasso and the Dantzig selector. Two of the most common problems in 
statistics are to estimate the response X/3* £ R" (error prediction) and the target 
j8* 6 R p from the linear model 

(4) y = Xp+z, 

where X E R" X P is a design matrix and z 6 R" a noise vector. We assume that 
z = (z;)" =1 is a centered Gaussian noise with variance c 2 such that the z/s are 
Af(0, c 2 ) -distributed and could be correlated. 



1.3.1. Bound on the noise. Denote A = 2 o~ \J\og n. As mentioned in Section 4, the 

l\-normalization condition implies that 

P(||X T z||oo< A) >\-r\ n , 

where t] n depends only on n. It is now standard that the parameter A (up to a 
constant) is a natural lower bound on the tuning parameters of the lasso and the 
Dantzig selector. 

1.3.2. The lasso. In his fundamental article [Tib96] R. Tibshirani pointed out that 
the geometry of the ^-norm produces coefficients that are exactly 0. The lasso 
estimator is 

(5) /$' eargminllly-X^I^+Al^llJ, 

where A is a tuning parameter. Intuitively, the lasso estimator will be at the point 
of contact of this smooth residual sum of squares function and convex, piecewise- 
flat constraint surface. This point of contact is very likely to belong to a fc-face 
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(i.e. the fc-simplex generated by k extremal points) of an ^i-ball. Thus it is likely 
to have a lot of coefficients that are exactly (see [Tib96]). 

1.3.3. The Dantzig selector. In 2005, E. Candes and T. Tao [CT07a] gave a new esti- 
mator, the Dantzig selector. This estimator is the solution to the ^i-regularization 
problem 

(6) $ d 6 argmm |[ jS |j ^ s.t. |X T (y — X/3) ||oo< A, 

where ||.||oo is the £co-norm and A a tuning parameter. We consider tuning pa- 
rameters A such that A > A. This last inequality implies that j6* is feasible with 
high probability (see Section 4). 

1.4. Organization of the paper. The outline of the paper is as follows. Next 
section gives oracle inequalities that are optimal up to an explicit factor. Section 3 
presents the unbalanced expander graphs. Finally, last section shows that the 
normalization condition" implies the "bound on the noise condition" appearing 
in the theorems of Section 2. 

2. Oracle inequalities for the lasso and the Dantzig selector 

This section is devoted to oracle inequalities. They are established from the 
Uncertainty Principle condition (1). In particular, Section 2.2 shows that they are 
optimal up to a known multiplicative factor. 

2.1. Error prediction and variable selection for the Lasso. The lasso estimator is 
defined by (5). 

Theorem 2.1 (Error prediction for the lasso) — Let X G R nx f be a design matrix 
such that 

• It satisfies the Uncertainty Principle of order s (see (1)), 

• it satisfies the Bound on the noise condition: 

(7) PdlX^Hoo^ A) >l-rjn, 

where r\ n is some known function that depends only on n. 

Let j8* G 1R P be any vector of R p and S C {1, . . . , p} the indices of its s largest (in 
magnitude) coefficients. Take A > 6A. 
Then it holds 

\\XfS* - Xfi% + (A - 6A)||/5' SC - fe^ < 4A(2An + , 
with probability at least 1 — r\ n . 

Remark. In Section 4, we show that the l\ -normalization condition implies the 
Bound on the noise condition. 
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Proof. Set 7 = fi* — ft. On the event {j|X T z|| co < A}, we have the standard 
inequality: 



l x 'r|l2 + A ll/ 3 'lli 



(8) < 

(9) < 



y -z-Xp l \\ 2 2 + \\\p%, 
y-X/5'||^-2z T (y-X/3 ; ) + ||z||2 + A||/5'|| 1/ 
y-X/5'||^-2(X T z) T 7-||z||^ + A||/5'|| 1/ 
y-X 1 6'||2+2A||7|| 1 - \\z\\ 2 2 + , 



y-X^|| 2 + 2A||7|| 1 -||z|| 2 + A||/5*|| 1/ 

= 2A|| 7 || 1 +A||/5*|| 1 , 

using the definition of the lasso estimator in the inequality (9) and the event 
{||X T z|| oo < A} in the inequality (8). It follows that 

ll x T||2 + A||iS' sc || 1 -2A||7 SC || 1 < 2A||7s|| 1 + A(||^|| 1 -||/3' s || 1 )+A||^ c || 1/ 

< (A + 2A)|| 7s || 1 + A||/5* c || 1 . 



Hence we get ||X-y||^-|- (A - 2A) || 7 s4i ^ (A + 2A) \\jsWx + 2 ^||j 8 s c lli- Using the 
Uncertainty Principle, it holds 

\\ Xr y\\ 2 2 + ^^\M\i < 2 ( a + 2A )II x t|| 1 +2a||^ c || 1/ 

< 2(A + 2A) Vn||X7|| 2 +2A||^r|| 1 , 

We deduce the inequality HX7H2 + (A - 6A)||7 SC || 1 < 4 (A + 2A) 2 n + 4A||jSg C || r 
Since A > 6A, we get 



||X/3* - Xp H2 + (A - 6A)\\p sc - fa || j < 4A [2 An + ||^||i 

Since the event { ||X T z|| oo < A} has probability at least 1 — t] n , this concludes the 
proof. □ 

If j6* is s-sparse (i.e. it has at most s nonzero coefficients), we have the following 
result. 

Corollary (sparse target) — Let j6* 6 IR p be an s-sparse vector and S C {1, . . . , p} 
its support. Take A > 6A. 
Then it holds 

j j X/3* - Xp l I j 2 + ( A - 6A) j j fl l sc j j 1 < 8A 2 n , 
with probability at least 1 — n„. 

• In the case A = 6A, we derive the error prediction: 

(10) \\Xp*-Xfi l \\ 2 < 24^2 cr^/n log n, 

with probability at least 1 — n n . 

• In the case A = 7 A, we derive: 

(11) H^l^ < 392V2crn v /Iogn, 
wff/z probability at least l — n n . 
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Our oracle inequalities give the error of prediction ||Xj8* — Xj6'|L and variable 
selection || jSg c Ij- (see Section 2.2.5). 

2.2. Results in the Parvaresh-Vardy code framework. Using the Parvaresh-Vardy 
codes [GUV09], we have the following result (see Theorem 3.2). There exists an 
universal constant 9 > such that, for all a > 0, p > s > 0, there exists an explicit 
renormalized adjacency matrix X £ ~R. nx P of unbalanced expander graph (with 
expansion constant e = 1/8) such that, 

(i) n < s 1+a (9 logplogs) 2+ £, 

(ii) the left degree d of the graph satisfies d < (#log plogs) 1+i , 
(Hi) the matrix X satisfies the ^-normalization condition, 

(iv) the columns X, £ K" of the matrix X £ M" xp are such that ||X,-|| 2 = l/Vd, 

(v) the matrix X satisfies the Uncertainty Principle condition (1). 

Observe that the conditions (i) and (ii) are derived from Theorem 3.2, the con- 
ditions (Hi) and (iv) are derived from the definition of a renormalized adjacency 
matrix (20), and the condition (v) is given by Lemma 1.1 (where the expansion 
constant is such that e = 1/8). Hence, this framework is relevant in the case of 
explicit expander graphs. 

In Section 2.2, we assume that the design X £ R" X P satisfies the five above conditions. 
In particular, it holds 

Vi £ {l,...,p}, ||X<||2= Ml > (fllogplogs)- 1 ^ . 

2.2.1. Oracle. As a matter of fact, the inequality (10) shows that we can estimate 
Xfi* £ W with nearly the same precision as if one knew in advance the support 
of ^3* £ ]R^. Indeed, consider the ordinary least square estimator: 

6 ois = am min lly-XBlL, 
supp(/3)=S 

where S denotes the support of the target f>* £ W. Observe that this estimator 
uses a prior knowledge on the support of j6*. For this reason, we can say that this 
estimator is optimal. A simple calculation gives 

-B\\XB ols -XB*\\l=a 2 -. 
n ii 'ii/ n 

Using (z) we deduce that the inequality (10) is optimal up to an explicit multiplicative 
factor p(s, p). Namely, it holds 

i||X/3* - X6 l \\l < C.p(s, p) . \v\\XB° ls - Xp% . 

where p(s, p) = ((1 + a) logs + (2 + 2/oc) log(0 log p logs)) . s a (6 log plogs) 2+ «, 
and C > is a numerical constant. This inequality shows that prediction using 
Parvaresh-Vardy code design is almost optimal. Indeed, the prediction error is, 
up to the factor p(s, p), as good as the error prediction one would have get know- 
ing the support of the target. Furthermore, notice that the same comment holds 
for the Dantzig selector (see Section 2.3). As a matter of fact, all the comments of 
Section 2.2 extend to the Dantzig selector. 
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In order to compare our result to the standard results given by the Restricted 
Eigenvalue assumption [BRT09] and the coherence property [DET06], we give the 
following inequality. 

(12) ^ -X^KCMs^-o 2 ^ 5 ^. 

n II 112 II 112 n 

where t(s, p) = s a (9 log plogs) 3+ « . (log (s l+0i {9 log plogs) 2+ «) /log p), and the 
numerical constant C > is the same as in the previous inequality. 

2.2.2. Comparison with the coherence property approach. In 2007, E. J. Candes and Y. 
Plan obtained a remarkable estimate in error prediction for the lasso. They used 
a so-called coherence property following the work of D.L. Donoho et al. [DET06]. 
They showed (Theorem 1.2 in [CP09]) that, with high probability, for every design 
matrix satisfying the coherence property, it holds 

(13) 

n II r 112 II 112 n 

where C' > is some positive numerical constant. Note that the upper bounds 
(12) and (13) are similar up to the factor t(s, p). The coherence is the maximum 
correlation between pairs of predictors. This property is fundamental and allows 
to deal with random design matrices. We do not use this property here, though 
we get the same accuracy (up to the factor t(s, p)) and we extend their error 
prediction result to deterministic design matrices. 

2.2.3. Comparison with the Restricted Eigenvalue approach. In the same way, Bickel, 
Ritov and Tsybakov [BRT09] established that, with high probability, 

hxfs*-x($f 2 <c".^\\x 1 \\l s -^, 

n" " z " " z n 

where C" > is some positive numerical constant depending on the (s, 3)-restricted 
^-eigenvalue, k(s, 3). Again, it is difficult to estimate jc(s,3) for the adjacency 
matrix of an unbalanced expander graph. Observe that, up to the factor t(s, p), 
we get same accuracy. 

2.2.4. Comparison with the H s i(l/3) approach. As mentioned in the introduction, 
the H s i(l/3) condition is devoted to regular consistency for the lasso and the 
Dantzig selector. In particular, the results in [JN10] should be compared to our 
result (11) in the previous corollary: 

(14) 

\ft S c\ x < 392V2a.s 1+a (9 logplogs^ + s^log (V+^fl logplogs) 2+ i) . ||Xi|| 2/ 

using (i). Following Proposition 9 in [JN10] (with f> = ||Xi|| 2 , k = 1/3, p = 1, 
e = rj n , and A as in (3)), A. Juditsky and A Nemirovski show that 

(15) < 192V2 a. s 1+K {9 logplogs) 2 +i . ^/lo S (p/n„) . ||Xi || 2 . 

Up to a logarithmic factor, the result (15) is of the same order than our result (14). 
We get the same accuracy as in [JN10]. 
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2.2.5. Variable selection. Observe that (14) does not depend on the magnitude of 
the s-sparse target B* G W . As a matter of fact, this inequality holds as soon as 
the Bound on the noise condition (7) is satisfied. Notice that the lasso estimator 
j6 G is stochastic since it depends on the noise z. Define MVSEj(s, p), the 
Mean Variable Selection Error in the £j-norm, as 

II Ik 

MVSEi (s,p):= sup sup 1 rfJ i , 

S|<s 

where S denotes the support of the target fi* G W, the latter supremum is taken 
over the event {||X T z||co< A}, and \S C \ denotes the size of the complement of S. 
We investigate the behavior of MVSEj (s, p) as s and p tend to the infinity. Notice 
that, in this framework, the probability of the event { ||X T z||oo< A} tends to 1. As 
a matter of fact, MVSEi(s, p) captures the worst variable selection error case on 
an event of probability close to 1. 

Using the inequality (14), one can check that the following result holds. If 
there exists < t < 1/(1 + a) such that s = o(p t ) then 

MVSEi (s,p) — ► 0. 

p— >+00 

This shows that the £j-mean error in variable selection tends to 0. 

2.3. Error prediction and variable selection for the Dantzig Selector. We recall 
that the Dantzig selector is defined by (6). 

Theorem 2.2 — Let X G M" xp be a design matrix satisfying the Uncertainty 
Principle condition (1) and the Bound on the noise condition (7). Let f>* G R p and 
S C {1, . . . , p} the indices of its s largest (in magnitude) coefficients. Take A > A. 
Then, it holds 

|| X0* - XB d \\l < 4(A + A) (16(A + A)n +3\\ft, \\ t ) . 
with probability at least 1 — rj n . 

Proof. Set 7 = B* - f> d . On the event { HX" 1 ^^ < A}, it yields 

||x 7 g < ||x T x 7 || oo || 7 || 1 

= ||x T (y-X/5 rf )+X T (xr-y)L||7|| 1 
< (A + A)|| 7 || r 

Hence we get 

(16) ||X 7 ||^-(A + A)|| 7S c|| 1 < (A + A)|| 7s || 1 . 

On the event {||X T z|| oo < A} and for A > A, the vector ft* is clearly feasible 
(i.e. it satisfies the constraint ||X T (i/ — X/3)||oo< A). As a matter of fact, it holds 

ll^l^ < ||jS*|| r Thus, 

ll^lli < (H^lli " ll^lli) + WfaWi < hsWr + ■ 
Since || 7 s c || 1 < ||^s c lli"'" II^S c lli' it yields 

(17) H-rsHk < llTsId +2||^|| r 



10 



YOHANN DE CASTRO 



Combining (16) +3(A + A)(17), we get 

||X7||^ + 2(A + A)||7 S c|| 1 <4(A + A)|| 7s || 1 + 6(A + A)||^c|| 1 . 
Using the Uncertainty Principle, it holds 

||X 7 || 2 < 8(A + A)||X7|| 1 + 6(A + A)||^ C || 1 , 

< 8(A + A)Vn||X7|| 2 + 6(A + A)||^ e || 1 . 

We deduce the inequality ||X7|| 2 < 64(A + A) 2 n + 12 (A + A) ||^ c \\ y Finally, it 
holds 

||X/3* - X/3 rf || 2 < 4(A + A) (16(A + A)n + 3||/3^ y . 

Since the event { ||X T z|| oo < A} has probability at least l — rj n , this concludes the 
proof. □ 

If jS* is s-sparse, we derive the next result. 

Corollary (sparse target) — Let B* be an s-sparse vector. Take A > A. 
Then, we have 

(18) \\Xp*-Xp d \\ 2 < 8(A + A)Vn, 
with probability at least 1 — n n . 

Moreover, If A = A then we derive the error prediction: 

||X/3*-X£ rf || 2 < 32^n log n, 
with probability at least 1 — rj n . 

As mentioned in Section 2.2, our result is optimal up to an explicit factor. In fact, 
we achieve nearly the same accuracy that one would get if he knew a head in 
advance the support of j6*. By repeating the proof of the Theorem 2.2, we derive 
a result in regular consistency. 

Proposition 2.3 — Let X £ R" X P be a design matrix satisfying the Uncertainty 
Principle condition (1) and the Bound on the noise condition (7). Let f>* be an s-sparse 
vector and S C {1, . . . , p} be its support. Take A > A. 
Then, 

H/S^l^ < 32(A + A)n, 

with probability at least 1 — rj n . 
Moreover, if A = A then we have 

(19) \\p d sc \\ l <12Btrny/l^i, 
with probability at least 1 — n n , and a the variance of the noise. 

Proof. Set 7 = /3* - 6 d . On the event { ||X T z|| oo < A}, the inequality (17) holds. 
Since &* is s-sparse, this inequality yields ||7s c ||i < Ims||i- Using the Uncer- 
tainty Principle, we deduce that 1 1 Ts c 1 1 ^ ^ 4||X7|L < 4y / «||X7|| 2 . We conclude 
invoking (18). □ 

The same analysis as in Section 2.2 holds for the Dantzig selector using Parvaresh- 
Vardy code design. 
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3. Deterministic Design via Unbalanced Expander Graphs 

In this section we introduce unbalanced expander graphs and recall the main 
results shown by Berinde et al. [BGI+08]. The unbalanced expander graphs satisfy 
a vertex expansion property. This property controls the neighborhood / of any 
sufficiently small subset J of vertices on the left. 

3.1. Adjacency matrix of a bipartite graph. We concern with design X £ ]R" X P 
derived from the renormalized adjacency matrix of an unbalanced expander 
graph. We consider a bipartite graph G = (A,B, £), where A is the set of the 
left vertices, B the set of the right vertices, and E the set of the edges between A 
and B. Denote p and n respectively the cardinality of A and B. 



Figure 1. A bipartite graph G with regular left degree d. Each 
vertex in A has exactly d neighbors in B (here d = 2). 

A bipartite graph has regular left degree d if and only if every vertex in A has 
exactly d neighbors in B, see Figure 1. Suppose that G has regular left degree d, 
then the renormalized adjacency matrix X is 



where i £ {1, . . ., p} and £ {1, . . . , n}. 

3.2. Restricted isometry property. In the expander frame, the size n may de- 
pend on p and others parameters of the graph. The vertex expansion property 
states that the neighborhood of / is 'almost' d \ I\ as soon as 1 7 1 < s, where s is a 
parameter of the graph that can be as large as desired, see Figure 2. 
The formal definition of unbalanced expander graph is as follows. 

Definition 1 ((s, e) -unbalanced expander) — An (s, e) -unbalanced expander is a 
bipartite simple graph G = (A,B,E) with left degree d such that for any I C A with 
| J | < s, the set of neighbors } of I has size 



Subsequently we consider a parameter e such that e = 1/8 (see Section 1.2). 
Notice that e is fixed and does not depend on others parameters. In particular, 
we do not require that e goes to zero as p goes to the infinity. We call £ the 
expansion constant. Using the expansion property (21), Berinde et al. [BGI + 08] 
showed the fundamental theorem: 




(20) 




1 / d if i is connected to ;' , 
otherwise , 



(21) 



|/| > (l-e)d|I 
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Figure 2. The expansion property of an unbalanced expander 
graph: any sufficiently small subset I on the left has a neighbor- 
hood / of size at least (1 — e) d 

Theorem 3.1 (Restricted Isometry Property) — Let X £ ]R" X P be the renormal- 
ized adjacency matrix of an (s,e) -unbalanced expander. Then X satisfies the following 
RIP\ property: 

VielRP, (l-2e)||7 S || 1 < 11X75^ < \\y s \\ ir 

where S is any subset of {1, . . . , p} of size less than s, and j s the vector with coefficients 
equal to the coefficients of 7 in S and zero outside. 

In their article [BGI + 08] (Lemma 16 and Theorem 17), Berinde et al. derive a 
useful lemma which is a consequence of the RIP\ property (see Lemma 1.1 in the 
introduction). In the case e < 1/8, this lemma shows that the adjacency matrix of 
an unbalanced expander graph satisfies the Uncertainty Principle condition (1). 

3.3. Deterministic design. We present the work of Guruswami et al. [GUV09] on 
the explicit construction of unbalanced expander graphs. They recently proved 
[GUV09], based on the Parvaresh-Vardy codes [PV05], the following theorem. 

Theorem 3.2 (Explicit construction) — There exists an universal constant 6q > 
such that the following holds. For all a > and for all p,s,e > 0, there exists an 
(s,e) -unbalanced expander graph G = (A,B,E) with \A\ = p, left degree 

d< ((0 o logplogs)/e) 1+ % 
and right side vertices (of size n = \B\) such that 

(22) n < s 1+a ((0 o logplogs)/e) 2+ « . 

Observe that the constant 8 in Section 2.2 is exactly 6q/ e = 89q. As a matter of fact, 
all the results in this paper hold for the following deterministic construction 
of design: 

• Choose p the size of the target, and s the sparsity level, 

• Set £ = 1 /8 the expansion constant, and a > a tunning parameter, 

• Construct an (s, e) -unbalanced expander graph G from Paravaresh-Vardy 
codes. 

• Set X £ W ix f the renormalized adjacency matrix of the graph G. Notice 
that the number of observations n satisfies (22). 
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In a probabilistic framework, the following proposition can be shown using 
Chernoff Bounds [HX07]. 

Proposition 3.3 (Probabilistic construction) — Consider e > and p/2 > s. 

Then, with a positive probability, there exists an (s, e) -unbalanced expander graph G = 
(A, B, E) with \ A\ = p, left degree 

d= O log(p/s), 

>+oo 

and number of right side vertices (namely n = \B\), 

n= O (slog(p/s)), 

where the 0(.) notation does not depend on s but on e. 

In this paper, we denote by n the number of measurement (i.e. the size of B). 
These theorems show that it is possible to construct an explicit unbalanced ex- 
pander graphs close, in terms of the bound on n, to the optimum graphs obtained 
probabilistically. 

4. Bound on the noise 

In this section, we give an upper bound on the noise amplification ||X t z|| oq . 
In particular, we assume that the design X 6 R" X P satisfies the i\ -normalization 
condition and we show that it satisfies the bound on the noise condition (see 
Section 2). 

Lemma 4.1 (Non- Amplification) — It holds VzGR", ||X t z|| oo < 1 1 ^ 1 1 TO - 

Proof. Let 7 £ flV such that ||7|k = 1- Since the design matrix satisfies the 
^i-normalization condition, the triangular inequality gives that 11X7^ < ||7|L. 
Observe that this inequality stands for all vectors, not only sparse vectors. Fur- 
thermore, 

||x T z|L ^ „ ma ^ ( xTz ^) = M ma * ( Z ' X ^) ^ nT* {|l z IUI x 7|li} < ll z L' 

IItIIi<i Il7lk<i IItIIi<i 

where ( , ) is the standard Euclidean product. This last inequality concludes the 
proof. □ 

In order to upper bound ||X T z|| oo , it is enough to estimate Hz^. This comment 
allows us to reduce the dimension of the ambient space from p to n. 

Lemma 4.2 (Bound on the noise) — Suppose that z = (z,) " =1 is a centered Gauss- 
ian noise such that the z,'s could be correlated, and for all i 6 {1, . . . ,n}, we have 
Zi ~ JV(0,cr 2 ). 
Then, for A = 2 c^/log n, 

PdlX T z|| < A) > 1 



>7jin^\og n 

Proof. Denote (zi);=i ...„ the coefficients of z. The Lemma 4.1 gives 
(23) P(||X T z|| oo < A) > P (||z||oo< A). 



14 



YOHANN DE CASTRO 



Using Sidak's inequality in (23), it holds [Sid68]: 

P(||z||c O <A)>P(||z|| 0o <A) =flP(|z ! -| < A), 

i=l 

where the z/s are independent and have the same law as the z,'s. Denote <E> 
and cp respectively the cumulative distribution function and the probability density 
function of the standard normal. Set 8 = 2 ^/log n. It holds 

f\W {\zi\ < A) = P (| Zl | < A)" = (2<P(6) - 1)" > (1 - 2 9 (S)/S) n , 

i=l 

using an integration by parts to get 1 — ^{S) < f(5)/8. It yields that 

PdlX^I^ < A) > (1 - 2<p(5)/S) n > 1 - 2n^ = 1 



V2tz n \/log n 

This concludes the proof. □ 

This upper bound is valuable to give oracle inequalities, as seen in the previous 
sections. For readability sake, denote 

1 

7" 



!2n n -\/log n 

All the probabilities appearing in our theorems are of the form 1 — //„. Since n 
denote the number of observations, rj n is very small (less than 1 / 1000 for most 
common problems). Furthermore, by repeating the same argument as in Lemma 
4.2, we have the next proposition. 

Proposition 4.3 — Suppose that z = (z^/Li zs a centered Gaussian noise with 
variance a 1 such that the z, 's are J\f{0, a 1 ) -distributed and could be correlated. 
Then, for t > 1 and 

A t = (1 + t) cVIogn, 
(24) P(||X T z|L < A t ) > 1 ^ ^ 



ft+'r - 

1 + t)yjn log nn 2 

By replacing A by At in the statements of our theorems, it is possible to replace 
all the probabilities of the form 1 — n n by probabilities of the form (24) . Observe 
that these probabilities can be as small as desired. 
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